This paper presents ReasonFormer, a unified reasoning framework for mirroring the modular and compositional reasoning process of humans in complex decision-making. Inspired by dual-process theory in cognitive science, the representation module (automatic thinking) and reasoning modules (controlled thinking) are decoupled to capture different levels of cognition. Upon the top of the representation module, the pre-trained reasoning modules are modular and professional in specific and fundamental reasoning skills (e.g., logic, simple QA, etc). To mimic the controlled compositional thinking process, different reasoning modules are dynamically activated and composed in both parallel and cascaded manners to control what reasoning skills are activated and how deep the reasoning process will be reached to solve the current problems. The unified reasoning framework solves multiple tasks with a single model, and is trained and inferred in an end-to-end manner. Evaluated on 11 datasets requiring different reasoning skills and complexity, ReasonFormer demonstrates substantial performance boosts, revealing the compositional reasoning ability. Few-shot experiments exhibit better generalization ability by learning to compose pre-trained skills for new tasks with limited data, and decoupling the representation module and the reasoning modules. Further analysis shows the modularity of reasoning modules as different tasks activate distinct reasoning skills at different reasoning depths.
translated by 谷歌翻译
情感识别技术使计算机能够将人类情感状态分类为离散类别。但是,即使在短时间内,情绪也可能波动,而不是保持稳定状态。由于其3-D拓扑结构,也很难全面使用EEG空间分布。为了解决上述问题,我们在本研究中提出了一个本地时间空间模式学习图表网络(LTS-GAT)。在LTS-GAT中,使用划分和串扰方案来检查基于图形注意机制的脑电图模式的时间和空间维度的局部信息。添加了动力域歧视器,以提高针对脑电图统计数据的个体间变化的鲁棒性,以学习不同参与者的鲁棒性脑电图特征表示。我们在两个公共数据集上评估了LTS-GAT,用于在个人依赖和独立范式下进行情感计算研究。与其他现有主流方法相比,LTS-GAT模型的有效性被证明。此外,使用可视化方法来说明不同大脑区域和情绪识别的关系。同时,还对不同时间段的权重进行了可视化,以研究情绪稀疏问题。
translated by 谷歌翻译
任务概括是自然语言处理(NLP)的漫长挑战。最近的研究试图通过将NLP任务映射到人类可读的提示形式中来提高预训练语言模型的任务概括能力。但是,这些方法需要费力且不灵活的提示,并且在同一下游任务上的不同提示可能会获得不稳定的性能。我们提出了统一的架构提示,这是一种灵活且可扩展的提示方法,该方法会根据任务输入架构自动自动自定义每个任务的可学习提示。它在任务之间建模共享知识,同时保持不同任务架构的特征,从而增强任务概括能力。架构提示采用每个任务的明确数据结构,以制定提示,因此涉及几乎没有人类的努力。为了测试模式提示的任务概括能力,我们对各种一般NLP任务进行基于模式提示的多任务预训练。该框架在从8种任务类型(例如QA,NLI等)的16个看不见的下游任务上实现了强劲的零射击和很少的概括性能。此外,全面的分析证明了每个组件在架构提示中的有效性,其在任务组成性方面的灵活性以及在全DATA微调设置下提高性能的能力。
translated by 谷歌翻译
由于复杂的背景和文本实例的不同变化,场景文本识别是一项具有挑战性的任务。在本文中,我们提出了一个新颖的语义gan和平衡的注意网络(SGBANET),以识别场景图像中的文本。提出的方法首先使用语义gan生成简单的语义功能,然后使用平衡的注意模块识别场景文本。语义GAN旨在使支持域和目标域之间的语义特征分布对齐。与在图像级别执行的传统图像到图像翻译方法不同,语义GAN通过语义生成器模块(SGM)和语义歧视器模块(SDM)在语义级别执行生成和歧视。对于目标图像(场景文本图像),语义生成器模块生成简单的语义特征,这些功能与支持图像(清晰的文本图像)共享相同的特征分布。语义鉴别器模块用于区分支​​持域和目标域之间的语义特征。此外,平衡的注意模块旨在减轻注意力漂移的问题。平衡注意模块首先根据视觉瞥见向量和语义瞥见向量学习平衡参数,然后执行平衡操作以获得平衡的瞥见向量。在六个基准测试的实验,包括常规数据集,即IIIT5K,SVT,ICDAR2013和不规则数据集,即ICDAR2015,SVTP,cute80,验证我们提出的方法的有效性。
translated by 谷歌翻译
Question Answering (QA) is a longstanding challenge in natural language processing. Existing QA works mostly focus on specific question types, knowledge domains, or reasoning skills. The specialty in QA research hinders systems from modeling commonalities between tasks and generalization for wider applications. To address this issue, we present ProQA, a unified QA paradigm that solves various tasks through a single model. ProQA takes a unified structural prompt as the bridge and improves the QA-centric ability by structural prompt-based pre-training. Through a structurally designed prompt-based input schema, ProQA concurrently models the knowledge generalization for all QA tasks while keeping the knowledge customization for every specific QA task. Furthermore, ProQA is pre-trained with structural prompt-formatted large-scale synthesized corpus, which empowers the model with the commonly-required QA ability. Experimental results on 11 QA benchmarks demonstrate that ProQA consistently boosts performance on both full data fine-tuning, few-shot learning, and zero-shot testing scenarios. Furthermore, ProQA exhibits strong ability in both continual learning and transfer learning by taking the advantages of the structural prompt.
translated by 谷歌翻译
使用图像文本对的对比语言图像预测(剪辑)在零拍摄和传输学习设置中的图像分类中取得了令人印象深刻的结果。但是,我们表明,直接应用此类模型以识别对象检测的图像区域导致由于域移位导致的性能差:剪辑训练以与文本描述的整体匹配,而不捕获图像之间的细粒度对齐地区和文本跨度。为了缓解此问题,我们提出了一种称为RegionClip的新方法,可显着扩展剪辑以学习区域级视觉表示,从而在图像区域和文本概念之间实现细粒度对齐。我们的方法利用剪辑模型将图像区域与模板标题匹配,然后预先列出我们的模型以对准要素空间中的这些区域文本对。将预磨料模型转移到开放词汇对象检测任务时,我们的方法显着优于3.8 AP50和2.2 AP的最新技术,分别用于COCO和LVIS数据集的新型类别。更多,学习区域表示支持对象检测的零拍摄推断,显示了对COCO和LVIS数据集的有希望的结果。我们的代码可在https://github.com/microsoft/regionclip上获得。
translated by 谷歌翻译
大型预训练的语言模型已经表现出了产生现实文本的强大功能。但是,控制生成结果仍然具有挑战性。以前的方法,例如提示远远不足,这限制了语言模型的使用。为了解决这一挑战,我们提出了一种创新的方法,逆提示,更好地控制文本生成。逆提示的核心思想是使用生成的文本来在波束搜索期间反转提示,这增强了提示和生成文本之间的相关性,并提供了更好的可控性。经验上,我们预先培训了大规模的汉语模型,在开放式诗歌生成和开放式长形问题的任务上使用人力评估进行系统研究。我们的研究结果表明,我们的提出方法显着优于基线,而我们的发电质量与某些任务中的某些任务接近人类性能。叙述者可以在https://pretrain.aminer.cn/apps/poetry.html上尝试我们的诗歌生成演示,而我们的QA演示可以在https://pretrain.aminer.cn/app/qa找到。对于研究人员来说,代码是在https://github.com/thudm/inverseprompting中提供的。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$-differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains near-optimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{-1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{-1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{-1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately low-dimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is near-linear.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译